In order to perform our exploratory analysis, we will use four packages:
tidyverse: for data reading and editingusmap: for charting US county datagganimate: for animating chartsscales: to change the scales of chartslibrary(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.2 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(usmap)
library(gganimate)
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
knitr::opts_chunk$set(
message = FALSE,
warning = FALSE
)
The data for this analysis is published by the Federal Communications Commission (FCC). It has been aggregated from Form 477, where Internet Service Providers (ISPs) are mandated to self-report their internet coverage twice yearly. In particular, I’m using the Area Table available for download on the FCC broadband data website.
df_county <- read_rds("../data/fcc/data_export.rds")
For this analysis we are interested in the percent of people per county who have two or more providers available at a given speed. We do not have information on how ISPs price their service; in theory, the availability of two or more providers at a speed point would create some degree of competitive pricing and make the product minimally accessible.
df_clean <- mutate(df_county, across(c(speed, starts_with("has")), as.numeric)) %>%
group_by(date, id, speed) %>%
summarise(across(where(is.numeric), sum), .groups = "drop") %>%
rowwise() %>%
mutate(pop = sum(c_across(starts_with("has")))) %>%
ungroup() %>%
mutate(has_2more = has_2 + has_3more,
pct_1more = (pop - has_0) / pop,
pct_2more = has_2more / pop) %>%
rename(fips = id)
write_rds(df_clean, "../data/fcc/data_cleaned.rds")
The first set of charts looks at the percent of residents in each county who can purchase internet at 25 mbps download (the FCC definition of “high speed internet”) from 2+ ISPs from 2016-2020. While coverage was sparse in 2016, by 2020 it is nearly universally available.
fmt_title <- function(title) {
str_c("Percent of Residents with Access to Broadband Internet, ",
date_format("%B %Y")(title))
}
plot_fcc <- function(data, title, speed) {
plot_usmap(data = data, values = "pct_1more", color = "transparent") +
scale_fill_gradient(label = scales::percent, limits = c(0,1)) +
labs(title = fmt_title(title),
subtitle = speed,
fill = NULL,
caption = "Source: FCC Form 477 Aggregate Area Tables") +
theme(legend.position = "right")
}
filter(df_clean, speed == 25) %>%
group_nest(date) %>%
mutate(date = lubridate::my(date)) %>%
arrange(date) %>%
pwalk(~print(plot_fcc(..2, ..1, "25 mbps download")))
However, many sites on the internet agree that 25mbps is not sufficient for working or learning from home. They recommend purchasing internet that is 50-100 mbps. These charts show that while there has been an increase in availability of 100mbps internet, it is far from universally available, especially in more rural parts of the country.
fmt_title <- function(title) {
str_c("Percent of Residents with Access to WFH-Capable Internet, ",
date_format("%B %Y")(title))
}
filter(df_clean, speed == 100) %>%
group_nest(date) %>%
mutate(date = lubridate::my(date)) %>%
arrange(date) %>%
pwalk(~print(plot_fcc(..2, ..1, "100 mbps download")))